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Abstract. We describe the representation of the chemical affinity be- 
tween the antigen-combining site of the immunoglobulin molecule and 
the antigen molecule as the probability of the two molecules existing in a 
bound state. Our model is based on the identification of shape attractors 
in the configuration space for the joint antibody / antigen combining site 
sequence. We parameterize configuration space in terms of Ramachan- 
dran angles. The shape attractors allow us to construct a Markov chain 
I ' whose steady state distribution gives rise to the desired attachment prob- 

er ability. As a result we are able to delineate the enthalpic, entropic and 

kinetic components of affinity and study their interactions. 
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fNj ' 1 Overview 
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^D • Structural and thermodynamic details underlying antibody-antigen interaction 
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reveal that the correlation between mutations, free energy improvements and 
increase in affinity are more complex and subtle and cannot be completely cap- 

f~S tured through local interactions. Mutations in non-binding residues may underlie 

increases in affinity due to long-range effects such as stabilization of charge dis- 
tributions, decreasing conformational strain, or perhaps creating a permissive 

i-^ ■ environment for further affinity improving mutations by altering accessible re- 

qh gions of the shape space being explored. Recent literature on the evolution of 

I* ■ affinity at the protein-protein interface has attempted to elucidate the more com- 

. ,-h ', plex nature of cooperative effects as well as the ability of diverse naive antibodies 

j^ ■ to exploit different thermodynamic pathways in order to bind a given antigenic 

*-j \ epitope [12, 10]. It is natural to extend these pathway explorations to the process 

of affinity maturation. 

Our goal is to decompose this search for higher affinity using a hierarchical 
model that attempts to capture the dynamics at the level of the protein-protein 
interaction. Individual immunoglobulin (Ig) genotypes, together with locally ex- 
pressed antigenic epitopes, are mapped to a set of 'shape attractors'. These 
attractors in turn generate a Markov chain, which models the thermodynamic 
relaxation of the 'binding pocket'. An 'affinity landscape' is constructed, by al- 
lowing the shape attractors to perform a coalescing random walk as a result of 
the hypermutation process. Evolving immunologic affinity is the resulting path 
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of the steady state probability of the bound state. Convergence analysis of this 
process decomposes affinity maturation into its cnthalpic, kinetic and entropic 
components. 

This paradigm seeks to capture the statistical mechanics of antigen-antibody 
interaction, rather than details of the mutational process. Emergence of alter- 
nate optimization pathways may facilitate understanding of evolved strategies 
of affinity maturation and why some optimization paths may be better suited to 
some types of antigens than others. 

2 Background 

Primary antibody diversity is sufficient to bind the vast majority of antigens 
[6, 14]. As a result of stimulation of the system by protein antigen, somatic hy- 
permutation and/or gene conversion, followed by selection occurs in order to 
generate antibodies which have a higher affinity for a particular antigen and 
can rapidly and effectively respond as immediate effectors as well as generating 
immunologic memory [6]. Structurally, this is typically understood as the pro- 
gression from an 'induced fit' for the antigen to a 'lock and key' with enhanced 
pre-binding shape complementarity [10, 23, 13]. 

The nature of the protein-protein interaction that occurs upon binding is dic- 
tated by biophysical parameters which include electrostatics, hydrogen bonds, 
hydrophobic packing, van der Waals interactions as well as shape and charge 
complementarity and cooperative binding effects. The degree of these effects 
depends upon the chemical composition of the residues of the interacting com- 
ponents [12,11]. Mutational analyses have shown the selection of clones that 
undergo a stepwise increase in affinity - an additive effect of changes that cre- 
ate new hydrogen bonds, electrostatic or hydrophobic interactions between the 
residues of the antigen epitope, the antibody variable region, and associated sol- 
vent molecules [17, 23]. However, it is observed that some codon changes cannot 
be translated into stepwise energetic changes [17] and therefore prediction of 
affinity effects by a given mutation are often not directly quantifiable. Affinity 
maturation studies of multiple systems fail to reveal a consistent optimization 
strategy [23, 9, 15, 19, 24]. Some mutations may create residues which are involved 
in long range or cooperative binding, while other mutations may be neutral from 
an affinity perspective, but may actually be permissive of subsequent affinity en- 
hancing mutations [17, 7, 20]. 

3 Model Description 

The antigen-combining site of the antibody molecule consists of six hypcrvariable 
loop structures which extend from the barrel-shaped pairing of the beta strands 
of the variable regions of the heavy and light chains. The lengths and chemical 
compositions of these loops determine their binding capabilities [27] . Canonical 
structures have been identified for the hypervariable loops, including the third 
hypcrvariable region of the heavy chain (CDRH3) which typically has the most 
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contact residues with the antigen, and is also the most variable in length and 
sequence [4,5]. 

The hypcrmutation process acts primarily on the CDR coding sequences, and 
there is ample evidence for mutation 'hotspots' and coding bias in the variable 
(V) genes [18, 16]. Certain V genes may have evolved to be more mutable, and 
others more robust [7, 16]. It is likely that the mutational process takes advantage 
of biases in the V gene sequences and that subsequent mutations would attempt 
to create flexibility in some regions to facilitate docking while other regions are 
optimized to maximize antigen-antibody interactions and stabilize binding for 
appropriate feedback and signaling to occur [7, 24, 26]. 

We define three regions of each CDR loop; region 1 representing the anchored 
ends of the loop, region 2 is the segment extending to the base of the turn of the 
loop and region 3 representing the residues of the turn. The amino acids (see 
[27] for the one-letter amino acid codes) are classified in five groupings which 
reflect some of the properties of other amino acid classification systems [2] , yet 
our groupings are based on chemical properties that emphasize their positional 
effects on entropy, enthalpy and kinetics of binding and thus the relative contri- 
butions to the strength of the attractors in the model system. 

I Aliphatics- V, L, I, M 
II Aromatics- Y, W, F, H 

III Mobile aliphatic- A, G 

IV Polar/charged, mobile- S, T, Q, C, N, E, D 
V Basic, low mobility- K, R, P 

Group I represent most of the aliphatic amino acids. These are not ideal in the 
combining site due to limited interaction potential, but participate hydrophobi- 
cally in stabilizing and stacking interactions. Their rotational degrees of freedom 
may make them better at the ends of the CDR loops and their stabilizing effects 
may improve kinetics. Group II includes the polar residue histidine which is a 
stabilizing residue, along with the aromatic residues which have a delocalized n 
electron cloud and can participate in electrostatic, van der Waals and hydrogen 
bonding. Group II are also stabilizing residues which are often buried in the 
CDR's and are favored for their interaction potential with the antigen with a 
relative small entropy loss. They would likely provide improvements in kinetics, 
particularly with regard to K g. Group III includes the aliphatics glycine and 
alanine which are small and mobile, favored at the ends of the CDR loop for mo- 
bility and possibly faster binding kinetics, but are not at the center of the loop 
due to poor interaction potential. Group IV includes polar and charged residues 
that have good interaction potential. Asparagine, which is frequently buried in 
the CDR and can stabilize through hydrogen bonding to main chain atoms, also 
allows increased exposure of aromatic side chains. These residues maintain some 
mobility and could be accommodated in most portions of the loop, particularly 
regions 2 and 3. Group V includes large, basic residues that have good inter- 
action potential, but poor mobility favors their placement in regions 2 and 3. 
Proline is included here as an aliphatic residue that tolerates H2O exposure and 
is often found at turns [4, 5, 21, 8]. 
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3.1 Shape Space 

As mentioned earlier, our intent is to model the thermodynamic relaxation of 
the antibody (Ab)/ antigen (Ag) binding interface. We achieve this goal by 
constructing a Markov Chain on the joint shape space of the CDRs and the 
Ag epitopc(s). Specifically, we restrict our 'shape' model to the first two torsion 
angles (if and ip) for each residue. 

Let s G S = {0,1,2,3} 3N denote the nucleotide sequence encoding for the 
CDRs participating in the antigen interaction and the presented antigenic epi- 
tope, where N denotes the number of triplets encoding the corresponding amino 
acid sequences (note that all introns have been 'spliced out' and the genetic 
sequences are thought of as contiguous). Translation between the four letter nu- 
cleotide sequence to the twenty letter amino acid sequence is achieved through 
the DNA genetic code: Q : S — > {1, 2, . . . ,20}^, for any chain with N amino 
acids. 

Each of the N residues in the combined Ab/Ag primary sequence s gives rise 
to a (two-dimensional) Ramachandran plot for its two primary torsion angles 
(<Pi,ipi) for i = 1,2, ...,7V. Thus, the shape of the entire combined primary 
sequence is represented as a point lo = (tpi,ipi, if2, ip2, ■ ■ ■ , <Pn, 4>n) in a subset 
Q of the iV-dimensional topological torus T n , which we refer to as 'shape space'. 

A location map t : G(S) — > {1,2,3} W is defined to capture the 'topography' 
of the loops that comprise the CDRs and the antigenic epitope as described in 
the previous section. In particular, £(Q(s))- = 1 means that the corresponding 
residue is in the area immediately attached to the conserved framework. Corre- 
spondingly, the value 3 for the location map denotes the top of the exposed loops, 
while the value 2 is reserved for residues that transition between the framework 
and the bend at the top of the loop. 

3.2 Free Energy 

We model the free energy of the Ab/Ag complex as a real-valued function 
on shape space, G : Q —> 1Z+. We construct this function locally, beginning 
with neighborhoods around M Gaussian 'shape attractors' [1,3], 'dj G fi (j = 
1,2,..., M). In particular, let 

G(c;^) = -G,(2^)- Ar / 2 dct(^)- 1 /2 CX p|_I( c ,_^f ffr* (a;- #,.)}, (i) 

denote the potential around attractor #j, where Hj is the covariance matrix 
corresponding to attractor j. In this paper we consider the special case Hj = rjl^ 
for all j = 1,2, ... ,M. Combining these local 'attraction zones', we define the 
free energy globally as 

G(u) = min Gfui'&i). (2) 

l<j<M J 

The attractor locations were based on crystallographic studies of immunoglob- 
ulins [4,5]. Figure 1 shows an example of the locations of 91 attractors for Ser- 
ine (the ellipses capture the la onfidence levels around the estimated locations, 
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Typical Ramachandran plot for Serine attractors 
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Fig. 1. Ramachandran plot for Serine data in CDRs 

where available) . The attractor strengths were assigned randomly within ranges 
that depend on the amino acid group (1-5) and its location code (1, 2 or 3) as 
shown in Table 1. 
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Table 1. Ranges of attractor strengths Gj by amino acid group and location code 



3.3 Microcanonical Ensemble 

The evolution of the joint shape of the Ab/Ag complex is seen as a divergence- 
form diffusion (see [22] and the discussion of the Smoluckowski equation in [3] 
and references therein) with generator 
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for any real-valued function / : Q — > 1Z on shape space. The self-adjoint 
extension of £p generates a Markov process in L 2 (i?, up), where np(dx) = 
Z7 1 e~ /3G ^ [io(du>) is the Gibbs state at inverse temperature j3, with fi a the 
Haar measure on T N [29]. In particular, the process converges to an equilib- 
rium state which spends time in the neighborhood of each attractor in inverse 
proportion to the free energy barriers that separate the attractors from one an- 
other. In the next section we compute those barriers and use them to define 
the thermodynamic constants that control the relaxation of the Ab/Ag binding 
pocket. 



3.4 Barriers and Thermodynamic Constants 

We use the free energy function defined in (2) to compute the 'barrier form', a 
map B : QxQ — ► Q with the property that B ($j, i?j) is the conformation u> such 
that G(uj;di) — G(uj;dj). After some algebra, using the diagonal form of the 
covariance matrix described above, we arrive at the following characterization 
of B: 

B(# i ,# j ) = # j +\ ij {# i -# j ), (4) 

where 

(■dJd j -#r# i )-2r,log% 

Aij rp • ( J 

2{0 j -# i f{# j -# i ) 

Thus, the free energy level at the barrier is given by 



G (B (di, fy) ; <?0 = -(2 7 r? ? )- JV / 2 G j exp ' v '' 



<?^-t?fl? i -2»7logSt 



2 



' (6) 

Using the procedure described in [22] and the computation shown in (6), we 
define for each attractor i, 

K oS (i)=expl-mmG(B(# i ,# j );# i )\ (7) 



K on (i) = cxp<^ -maxminG (B (fy,^) ;&j) > (8) 

P = ^M (Q) 

1 K on (i) + K oS (i) w 

M 

e = ^p(i)i og P(i). (io) 

If attractor b corresponds to the bound Ab/Ag state, then the association con- 
stant for their interaction is given by K on (b) (8) and the dissociation constant is 
given by K ff(b) (7). Similarly the immunologic affinity is measured by Pf,, and 
£ described the entropy of the particular configuration of attractors. 
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3.5 Coalescing Random Walk 

Mutations occur with equal frequency at any location in the primary sequence 
s. If the resulting triplet codes for a new amino acid (the mutation is not silent), 
then the shape attractors change location and strength. Thus, the evolution of 
the system over a series of mutations can be thought of as a random walk in Q N . 
If, after a step in this random walk, two attractors satisfy the following equation 

GWirfi) < -(2irr))- N/2 Gi <=>■ (^ - tijf (^ - 0,-) < 2r 1 \og^, (11) 

then attractor i merges with attractor j. In fact, often mutations leading to these 
'coalescing' events reduce the entropy of the system and also help improve the 
binding affinity. 

4 Results 

We amended the random walk described in the previous section by requiring that 
a mutation step be accepted only if it does not decrease the affinity, except when, 
with probability 1— p, a 'global jump' is performed, whose result is accepted (as a 
new starting point) irrespective of its effect on affinity ([28]). We have simulated 
this 'restarting' coalescing random walk for different values of r/, M and p. When 
low values of p were used (e.g. p = 0.5) the thermodynamic constants and affinity 
landscape were explored in breadth. Instead, when we used high values of p (e.g. 
p > 0.99) we were able to obtain an in-depth exploration of few affinity hills. 

Figure 2 shows the 35 accepted steps of the coalescing random walk in a 
simulation run of 5,000 steps with p — 0.999, 77 = 9 and M — 50 (only 
three global jumps were observed). The simulation was based on the sequence 
QGTHVPYTARRSWYFDVWG with 

Hg(s)) = (1,2, 2, 3, 3, 2, 2, 1,1, 2, 2, 2, 3, 3, 3, 2, 2, 2,1). 

4.1 Affinity Landscape 

As a result of our simulation of the coalescing random walk in shape space, we 
can estimate the resulting affinity landscape. In all cases, we found the affinity 
landscape to be well modeled (R 2 > 0.9) by a Gaussian. Figure 3 shows the 
normal probability plot for a simulation run withp=0.5, r\ = 8 and M = 50 using 
the sequence and location map described above (/x = 0.4716 and a = 0.0137). 

4.2 Kinetic Evolution 

Figure 4 shows the typical evolution of the kinetic components of affinity, for a 
simulation with the same example sequence as above, and p = 0.5, r\ = 8 and 
M = 50. For low values of affinity, increases in the association constant K on play 
the biggest role in maturation. Quickly however K on saturates, and any further 
increases in affinity are driven almost entirely by decreases in the dissociation 
constant k s . In particular, we can be more than 90% certain that the correlation 
between K oS and P b , p (K oS , P b ) G (-0.9, -0.86) while p (K on , P b ) e (0.5, 0.55). 
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Fig. 2. Example of affinity maturation and its kinetic components 
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Fig. 3. Normal probability plot of affinity from 3, 534 steps of a simulated random walk 
in shape space 
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Fig. 4. An example of the contributions of K ou and K g to affinity maturation. All 
time series are sorted in order of increasing affinity. Affinity was scaled up by 87% for 
visual clarity 



4.3 Entropic Maturation Regimes 

As discussed earlier, the high specificity of the mature antibody is assumed 
to reflect improved a priori shape complementarity with the antigen, therefore 
diminishing the entropy change upon binding. 

In an effort to elucidate the entropies of maturation, Figure 5 examines the 
tail of the percentage drop in entropy during an affinity maturation simulation 
following the restarting coalescing random walk with the same parameters as in 
the previous section. We see that the entropy drop follows roughly an exponential 
distribution, with rate around 1.23. 



5 Next Steps 



The modeling paradigm described here, although not capturing details of the 
mutational effects on the protein dynamics, permits the computation of the 
distributions of the thermodynamic constants as well the delineation of the con- 
tributions of the kinetic components of affinity. The system also allows analysis 
of the entropic trade-off that occurs during the conformational evolution of the 
interface. 

A further refinement of the model would include the addition of more at- 
tractors (perhaps exponentially many in the number of residues) as this would 
be a more realistic representation from a protein dynamics perspective. This 
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Fig. 5. The distribution of entropy based on 3, 534 steps of a simulated random walk 
in shape space 



version of the model represents a special case where a single r\ is used for all of 
the attractors; a further improvement would allow 77 to vary between attractors 
and to allow the full covariance matrix H to incorporate off-diagonal elements 
capturing more complex attractor geometries. 

An important addition would be to explore better the entropies of the affinity 
maturation process and the efficiency of the trade-off between decreasing entropy, 
affinity and shape complementarity for the antigen. Decreasing entropy may not 
be an absolute requirement for affinity maturation and it may represent a more 
finely tunable parameter in the maturation process (preliminary unpublished 
results not shown here) [30,31]. 
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